GPT-5 Rubric Comparison Report

Comparing Original vs Reformulated (v2) vs Reformulated (v3) Rubrics

677
Total Tasks
9
Unchanged (v2)
668
Reformulated (v2)
0
Unchanged (v3)
677
Reformulated (v3)
12
Segments
compositional_tasks_v2 (87 tasks, 87 changed)
flights (51 tasks, 51 changed)
hotels_head (52 tasks, 52 changed)
jobs (38 tasks, 38 changed)
price_comparison (57 tasks, 57 changed)
realestate_complex (48 tasks, 48 changed)
recipe_to_shopping (48 tasks, 48 changed)
restaurants_tail (52 tasks, 51 changed)
shopping_head (56 tasks, 56 changed)
shopping_lists_tail (51 tasks, 51 changed)
things_to_do (80 tasks, 72 changed)
ticketing (57 tasks, 57 changed)